Authorship Attribution in Bengali Language
نویسندگان
چکیده
We describe Authorship Attribution of Bengali literary text. Our contributions include a new corpus of 3,000 passages written by three Bengali authors, an end-toend system for authorship classification based on character n-grams, feature selection for authorship attribution, feature ranking and analysis, and learning curve to assess the relationship between amount of training data and test accuracy. We achieve state-of-theart results on held-out dataset, thus indicating that lexical n-gram features are unarguably the best discriminators for authorship attribution of Bengali literary text.
منابع مشابه
A Supervised Authorship Attribution Framework for Bengali Language
Authorship Attribution is a long-standing problem in Natural Language Processing. Several statistical and computational methods have been used to find a solution to this problem. In this paper, we have proposed methods to deal with the authorship attribution problem in Bengali. More specifically, we proposed a supervised framework consisting of lexical and shallow features, and investigated the...
متن کاملAuthorship Identification in Bengali Literature: a Comparative Analysis
COLING 2012, Mumbai, December 2012. Authorship Identi ation in Bengali Literature: a Comparative Analysis Tanmoy Chakraborty Department of Computer S ien e & Engineering Indian Institute of Te hnology, Kharagpur India its_tanmoy se.iitkgp.ernet.in Abstra t Stylometry is the study of the unique linguisti styles and writing behaviors of individuals. It belongs to the ore task of text ategorizatio...
متن کاملA Survey on Authorship Analysis
The paper discusses about the problem of Authorship analysis, different types of authorship analysis’s such as authorship attribution, authorship identification, authorship profiling, plagiarism detection. It also addresses the issues in Indian language text. Keywords— Authorship attribution, authorship profiling, plagiarism detection, text classification.
متن کاملAuthorship Attribution Using Word Network Features
In this paper, we explore a set of novel features for authorship attribution of documents. These features are derived from a word network representation of natural language text. As has been noted in previous studies, natural language tends to show complex network structure at word level, with low degrees of separation and scale-free (power law) degree distribution. There has also been work on ...
متن کاملCross-Language Authorship Attribution
This paper presents a novel task of cross-language authorship attribution (CLAA), an extension of authorship attribution task to multilingual settings: given data labelled with authors in language X , the objective is to determine the author of a document written in language Y , where X 6= Y . We propose a number of cross-language stylometric features for the task of CLAA, such as those based o...
متن کامل